Shader (realtime, logical)
For other shader types related to computer science, see
Shader.
A shader is essentially a computer program[1] executed on a special environment[2][3]. This article specifically covers realtime shaders which are shaders meant to execute on consumer-level GPUs. Although shaders were introduced for graphics related tasks which still hold a major part of their applications, shaders can also be used for more generic computation, just as generic programs can be used to compute arbitrary data. As the computational power of GPUs continue to rise faster than conventional CPUs, the interest in shader programming attracts more and more attention. This requires rethinking algorithms or problems to fit the stream processing paradigm.
The goal of this article is to provide a look at the most important concepts concerning shaders in most important APIs such as OpenGL and Direct3D. The reader is assumed to be proficient with 3D graphics, a graphics API and fourth generation shading pipelines.
Shaders alone control a large part of the working of a programmable graphics pipeline and thus the final appearance of an object. However, they are not the only entities involved in defining an accurate behaviour. The resources being used, as well as settings of other pipeline stages, still has a great influence upon the final result.
Generic shader
A generic shader replaces a specific stage of the shading pipeline[4][5][6] with a user-defined program to be executed on need - thereafter, kernel. Shaders generally run in parallel with limited inter-communication between different executions - thereafter instances - usually limited to simplified first-derivative computation and cache optimizations. Being simply a sequence of operations, kernels are defined using special programming languages tailored to match the needs for explicit parallelization and efficiency. Various shading languages have been designed for this purpose.
Depending on the stage being replaced, a shader fetches specific data while its output is handed to successive stages. Input data is typically read-only and can be categorized in two main types:
- uniform input does hold constant between different kernel instances of the same draw call. The application can set these uniform values with ease between different draw calls, but there is no way to change a uniform value on a per-instance basis. Uniform values can be loaded by calling specific API functions[7][8][9].
- samplers are special uniforms meant to be used to access textures. Typically, sampler identifiers themselves specify a texture sampling unit in the pipeline (to be used for texture-lookups operations) which is then bound to a texture. Samplers are usually employed by kernels similarly to objects[10][11][12]. The intended usage model presents some differences depending on the shading language being used.
- varying input is typically the result of a previous computational stage, sometimes bound to some special, context-dependent semantics. For example, vertex positions are typical varying inputs for vertex shaders (named attributes in this context), pixel texture coordinates are typical varying inputs to pixel shaders.
The output is ideologically always varying (although two instances may actually output the same value). Fourth generation shading pipelines allow to control how output interpolation is performed[13] when primitives are rasterized and pixel shader's varying input is generated.
Vertex shader
A vertex shader replaces part of the geometry stage[14] of a graphics pipeline. Vertex shaders consume vertices filled by the Input Assembly stage by applying the specified kernel "for each vertex". The result, which usually include an affine transform, is then fetched by the next state - the Primitive Assembly stage. A vertex shader always produces a single transformed "vertex"[15] and runs on a vertex processor[16].
Producing vertex position for further rasterization is the typical task of the vertex shader[17].
Note the current meaning of "vertex" may or may not match the intuitive idea of a vertex. In general, it is better to think at a "vertex" as the basic input data set. This is especially important for generic processing, in which a vertex may hold attribute which does not map to any "geometrical" meaning.
Although vertex shaders were the first hardware accelerated shader type with a high degree of flexibility (see GeForce3, Radeon R200), their feature set was considerably different from other stages for a long time. Even if the exposed instruction set can be considered unified, the performance characteristics of vertex processing units can be considerably different from other execution units. Historically, branching has been considerably more efficient and flexible on vertex processors. Similarly, dynamic array indexing was possible only on vertex processors up to fourth generation pipelines.
Geometry shader
Geometry shaders replace a part of the geometry stage subsequent to Primitive Assembly stage and prior to Rasterization. Differently from other shader types, which replaced well-known tasks, the notion of a geometry shader have been only recently introduced to realtime systems so they currently don't map to anything possible before. Additionally, the problem being solved is conceptually very different so a generic geometry shader will be considerably different from a typical shader (both vertex and fragment).
Pixel shader
Main article:
Pixel shader
Pixel shaders determine (or contribute to the determination of) the color of a pixel.
References and notes
- ^ According to ARB_vertex_program a shader (specifically to vertex shaders, which are called programs in this context) is "a sequence of floating-point 4-component vector operations that determines how a set of program parameters ... and an input set of per-vertex parameters are transformed to a set of per-vertex result parameters".
- ^ ARB_vertex_program clarifies the meaning of execution environment on issue (2) as "A set of resources, instructions, and semantic rules used to execute a program".
- ^ Direct3D10 shaders do target a common shader core.
- ^ OpenGL does employ a flexible way to select the unit to be replaced. The method is introduced in the OpenGL 2.1 specification, section 2.15 (Vertex Shaders) and then specialized in section 3.11 (Fragment Shaders). Additionally, EXT_geometry_shader4 introduces a new target for Geometry Shaders.
- ^ Direct3D9 builds vertex and pixel shaders using two specific functions (CreateVertexShader, CreatePixelShader). Both functions build shaders according to a precompiled token array. This token array must be built previously using ad hoc tools or using D3DXCompileShader.
- ^ Direct3D10 is essentially the same as Direct3D9 augmented with a new shader type (CreateVertexShader, CreateGeometryShader, CreatePixelShader). Also note the shader compiler is now a first class citizen (D3D10CompileShader). Only HLSL is supported.
- ^ ARB_shader_objects which was integrated in OpenGL2.0 introduced a family of Uniform* calls for this purpose. The same calls were previously introduced with ARB_vertex_program and similar ProgramParameter* calls were available for NV_vertex_program. Note that uniform values are a per-program (or per-shader) property as can be read in ARB_shader_objects issue (9) : "Should values loaded for uniforms be kept in a program object, so that they are retained across program object switches? ... YES".
- ^ Direct3D9 does have a low-level approach to shader parameters. Shader themselves do not allow uniform values to be set, instead, a pool of device registers is used. Furthermore, since the API considers pixels and vertex resources as distinct, there are specific calls such as SetPixelShaderConstantF and SetVertexShaderConstantF. For high-level shaders instead, a ID3DXConstantTable can be used.
- ^ Direct3D10 uses a method which is similar to D3D9's but considerably more efficient and powerful. Different pools of Vertex, Pixel and Geometry resources are provided but constant values are now packed in constant buffers. In a certain sense, the previous concept of "uniform value register" is replaced by a "constant buffer reference register". Changing the value of a specific program parameter then involves mapping the constant buffer and modifying its content or - more realistically - changing the constant buffer to use.
- ^ A sampler is essentially an integer number in GLSL with special semantics provided by the compiler which is used as an 'opaque' object. According to OpenGL2.1 specification (subsection 2.14.4 Samplers): "Samplers are special uniforms used in the OpenGL Shading Language to identify the texture object used for each texture lookup. The value of a sampler indicates the texture image unit being accessed. Setting a sampler's value to i selects texture image unit number i. ... The type of the sampler identifies the target on the texture image unit. ... The location of a sampler needs to be queried with GetUniformLocationARB, just like any uniform variable."
- ^ A sampler in Direct3D9 is a special pseudo-register (sampler) to be used with texture-lookups operations. Given a handle to a sampler identifier, ID3DXConstantTable::GetSamplerIndex can be used to identify the sampler unit being used.
- ^ In D3D10 HLSL, the concept of "sampler" is superseded by texture object. Using a texture object involves a syntax similar to "real" objects in C++/Java.
- ^ Cool OpenGL tips (Khronos Group presentation at GDC 2007) is likely the easiest document to read on interpolation methods for OpenGL. Comes with a few screenshots. MSDN features a D3D10 specific page.
- ^ ARB_vertex_shader reads: "A vertex shader replaces the transformation, texture coordinate generation and lighting parts of OpenGL, and it also adds texture access at the vertex level."
- ^ D3D10, the Vertex Stage.
- ^ GLSL 1.20 specification, section 2.1.
- ^ Although writing a vertex position is currently required in GLSL (specification version 1.20.8, section 7.1) geometry shaders can be used to somehow relax this requirement. In fact, EXT_geometry_shader4 does modify section 7.1 of the GLSL specification to: "The variable gl_Position is available only in the vertex and geometry language and is intended for writing the homogeneous vertex position. ... This value will be used by primitive assembly, clipping, culling, and other fixed functionality operations that operate on primitives after vertex or geometry processing has occurred. ... Writing to gl_Position is optional. If gl_Position is not written but subsequent stages of the OpenGL pipeline consume gl_Position, then results are undefined."
More friendly, EXT_geometry_shader4 issue (12) explains that that this is now optional not only because of Geometry Shaders, but Stream-Out as well.